Skip to content

Bump chardet from 5.2.0 to 7.4.1#1670

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/chardet-7.4.1
Closed

Bump chardet from 5.2.0 to 7.4.1#1670
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/chardet-7.4.1

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Apr 8, 2026

Copy link
Copy Markdown
Contributor

Bumps chardet from 5.2.0 to 7.4.1.

Release notes

Sourced from chardet's releases.

7.4.1

Bug Fixes

  • BOM-prefixed UTF-16/32 input now returns utf-16/utf-32 instead of utf-16-le/utf-16-be/utf-32-le/utf-32-be. The endian-specific codecs don't strip the BOM on decode, so callers were getting a stray U+FEFF at the start of their text. BOM-less detection is unchanged. (#364, #365)

Full Changelog: chardet/chardet@7.4.0...7.4.1

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

  • New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

  • Eliminated train/test data overlap via content fingerprinting
  • Added MADLAD-400 and Wikipedia as supplemental training sources
  • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF
  • Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)
  • Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6
Accuracy (2,517 files) 99.3% 88.2% 85.4%
Speed 551 files/s 12 files/s 376 files/s
Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.3.0

License

  • 0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.

Features

  • Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (#350)
  • New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)

Bug Fixes

  • Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable

Performance

... (truncated)

Changelog

Sourced from chardet's changelog.

7.4.1 (2026-04-07)

Bug Fixes:

  • BOM-prefixed UTF-16 and UTF-32 input now reports utf-16 and utf-32 instead of the endian-specific variants. Python's utf-16-le/utf-16-be/utf-32-le/utf-32-be codecs keep the BOM as a U+FEFF in the decoded string, while utf-16/utf-32 strip it, so callers passing the detection result directly to .decode() were getting a stray BOM at the start of their text. BOM-less UTF-16/32 detection (via null-byte patterns) is unchanged and still returns the endian-specific name. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#364](https://github.com/chardet/chardet/issues/364) <https://github.com/chardet/chardet/issues/364>, [#365](https://github.com/chardet/chardet/issues/365) <https://github.com/chardet/chardet/pull/365>)

7.4.0 (2026-03-26)

Performance:

  • Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

  • Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

    • Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)
    • Added encoding-aware substitution filtering: character substitutions during training now only apply for characters the target encoding cannot represent
    • Increased training samples from 15K to 25K per language/encoding pair (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude)

... (truncated)

Commits
  • d9ae78d docs: changelog for 7.4.1
  • 2a54c68 Return utf-16/utf-32 (not -le/-be) when a BOM is present (#365)
  • c63c632 Address GitHub code quality findings and add missing test coverage
  • 1ad8e6a Revert "Add PyInstaller hook to collect mypyc shared runtime library (#359)" ...
  • 7fb0563 Add PyInstaller hook to collect mypyc shared runtime library (#359)
  • 2d75e6d Link to blogpost in README
  • e37cf3c fix: prevent dirty-tree version in Windows mypyc wheel builds
  • f9f5af2 Fix a couple errors in the changelog
  • 53755de chore: add .superpowers/ to .gitignore
  • 3a20df6 docs: update README examples with correct outputs
  • Additional commits viewable in compare view

@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 8, 2026
@codecov

codecov Bot commented Apr 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.69%. Comparing base (efaab69) to head (38d3ac1).
⚠️ Report is 5 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1670   +/-   ##
=======================================
  Coverage   54.69%   54.69%           
=======================================
  Files         336      336           
  Lines       27440    27440           
=======================================
  Hits        15007    15007           
  Misses      12433    12433           
Flag Coverage Δ
functionaltests 0.00% <ø> (ø)
unittests 54.69% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.1.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.4.1)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/pip/chardet-7.4.1 branch from d2b3a1f to 38d3ac1 Compare April 8, 2026 18:28
@dependabot @github

dependabot Bot commented on behalf of github Apr 13, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #1679.

@dependabot dependabot Bot closed this Apr 13, 2026
@dependabot dependabot Bot deleted the dependabot/pip/chardet-7.4.1 branch April 13, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Development

Successfully merging this pull request may close these issues.

0 participants